Search CORE

318 research outputs found

From Conventional Data Analysis Methods to Big Data Analytics

Author: Saporta Gilbert
Publication venue: 'Wiley'
Publication date: 19/01/2018
Field of study

International audienceData analysis in this chapter mainly means descriptive and exploratory methods, also known as unsupervised. The objective is to describe as well as structure a set of data that can be represented in the form of a rectangular table crossing n statistical units and p variables. Data analysis methods are essentially dimension reduction methods that are divided into two categories: factor methods; and the unsupervised classification methods or clustering. Data mining is a step in the knowledge discovery process, which involves applying data analysis algorithms. Data mining seeks to find predictive models of a Y denoted response, but from a very different perspective than that of conventional modeling. This chapter distinguishes regression methods where Y is quantitative, supervised classification methods (also called discrimination methods) where Y is categorical, most often with two modalities. The chapter also discusses new tools for big data processing, based on validation with data set aside

Hal-Diderot

Clusterwise methods, past and present

Author: Saporta Gilbert
Publication venue: HAL CCSD
Publication date: 01/07/2017
Field of study

International audienceInstead of fitting a single and global model (regression, PCA, etc.) to a set of observations, clusterwise methods look simultaneously for a partition into k clusters and k local models optimizing some criterion. There are two main approaches: 1. the least squares approach introduced by E.Diday in the 70's, derived from k-means 2. mixture models using maximum likelihood but only the first one easily enables prediction. After a survey of classical methods, we will present recent extensions to functional, symbolic and multiblock data

50 Years of Data Analysis: From Exploratory Data Analysis to Predictive Modeling and Machine Learning

Author: Saporta Gilbert
Publication venue: ISTE-Wiley
Publication date: 01/01/2019
Field of study

International audienc

Hal-Diderot

Une brève histoire de l'apprentissage

Author: Saporta Gilbert
Publication venue: Editions Technip
Publication date: 01/01/2018
Field of study

International audienc

Quelle statistique pour les Big Data?: Entretien avec Gilbert SAPORTA

Author: Saporta Gilbert
Publication venue: Société française de statistique
Publication date: 01/01/2017
Field of study

International audienceTout le monde s'intéresse au Big Data. Le public est de mieux en mieux informé sur les potentialités que les données massives recèlent et sur les dangers que leur utilisation peut comporter. Mais très rares sont ceux qui savent ce qui se cache « sous le capot » des nouvelles méthodes. Statistique et Société a demandé à Gilbert Saporta, qui fait partie de ce petit nombre, d'éclairer autant que possible les non-spécialiste

A generalization of partial least squares regression and correspondence analysis for categorical and mixed data: An application with the ADNI data

Author: Beaton Derek
Behavioral Herve,
Saporta Gilbert
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 07/02/2020
Field of study

The present and future of large scale studies of human brain and behaviorin typical and disease populationsis mutli-omics, deep-phenotyping, or other types of multi-source and multi-domain data collection initiatives. These massive studies rely on highly interdisciplinary teams that collect extremely diverse types of data across numerous systems and scales of measurement (e.g., genetics, brain structure, behavior, and demographics). Such large, complex, and heterogeneous data requires relatively simple methods that allow for exibility in analyses without the loss of the inherent properties of various data types. Here we introduce a method designed * Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimag-ing Initiative (ADNI) database (http://adni.loni.usc.edu/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found a

Science des données, données massives : défis et nouveaux métiers

Author: Saporta Gilbert
Publication venue: HAL CCSD
Publication date: 03/05/2019
Field of study

International audienc

SFLM: A mix of a Functional Linear Model and of a Spatial Autoregressive Model for spatially correlated functional data

Author: Huang Tingting
Saporta Gilbert
Wang Huiwen
Wang Shanshan
Publication venue: HAL CCSD
Publication date: 31/08/2018
Field of study

International audienc

Hal-Diderot